Skip to content

feat: pipeline accepts vep.tar.gz or vep/ dir#105

Open
emmcauley wants to merge 4 commits into
em_nice_to_havesfrom
em_vep_cache
Open

feat: pipeline accepts vep.tar.gz or vep/ dir#105
emmcauley wants to merge 4 commits into
em_nice_to_havesfrom
em_vep_cache

Conversation

@emmcauley

@emmcauley emmcauley commented May 7, 2026

Copy link
Copy Markdown
Collaborator

Closes #96.

The reason we can't use the nf-core module UNTAR is because it applies --strip-components 1 when extracting, which strips the top-level directory from the archive. VEP requires the species/ subdirectory to be present under the cache root. We don't run VEP as part of the pipeline tests, I picked up on this with Claude.

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 2711c2d

+| ✅  98 tests passed       |+
#| ❔  57 tests were ignored |#
!| ❗  14 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • files_exist - File not found: ro-crate-metadata.json
  • readme - README did not have an nf-core template version badge.
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in README.md: Include a figure that guides the user through the major workflow steps. Many nf-core
  • pipeline_todos - TODO string in README.md: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file.
  • pipeline_todos - TODO string in main.nf.test: Once you have added the required tests, please run the following command to build this file:
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.

❔ Tests ignored:

  • files_exist - File is ignored: .editorconfig
  • files_exist - File is ignored: .github/.dockstore.yml
  • files_exist - File is ignored: .github/CONTRIBUTING.md
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_exist - File is ignored: .github/PULL_REQUEST_TEMPLATE.md
  • files_exist - File is ignored: .github/actions/get-shards/action.yml
  • files_exist - File is ignored: .github/actions/nf-test/action.yml
  • files_exist - File is ignored: .github/workflows/branch.yml
  • files_exist - File is ignored: .github/workflows/ci.yml
  • files_exist - File is ignored: .github/workflows/linting.yml
  • files_exist - File is ignored: .github/workflows/linting_comment.yml
  • files_exist - File is ignored: .github/workflows/nf-test.yml
  • files_exist - File is ignored: .prettierignore
  • files_exist - File is ignored: .prettierrc.yml
  • files_exist - File is ignored: CHANGELOG.md
  • files_exist - File is ignored: CITATIONS.md
  • files_exist - File is ignored: CODE_OF_CONDUCT.md
  • files_exist - File is ignored: LICENSE
  • files_exist - File is ignored: assets/email_template.html
  • files_exist - File is ignored: assets/email_template.txt
  • files_exist - File is ignored: assets/nf-core-twistcgp_logo_light.png
  • files_exist - File is ignored: assets/sendmail_template.txt
  • files_exist - File is ignored: conf/igenomes.config
  • files_exist - File is ignored: conf/igenomes_ignored.config
  • files_exist - File is ignored: conf/test_full.config
  • files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_dark.png
  • files_exist - File is ignored: docs/images/nf-core-twistcgp_logo_light.png
  • files_exist - File is ignored: docs/output.md
  • files_exist - File is ignored: docs/README.md
  • files_exist - File is ignored: docs/usage.md
  • nextflow_config - nextflow_config
  • nf_test_content - nf_test_content
  • files_unchanged - File ignored due to lint config: CODE_OF_CONDUCT.md
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/.dockstore.yml
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/config.yml
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File does not exist: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File does not exist: assets/sendmail_template.txt
  • files_unchanged - File ignored due to lint config: assets/nf-core-twistcgp_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-twistcgp_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_nf_test - actions_nf_test
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/twistcgp/twistcgp/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • rocrate_readme_sync - rocrate_readme_sync

✅ Tests passed:

  • files_exist - File found: .gitattributes
  • files_exist - File found: .gitignore
  • files_exist - File found: .nf-core.yml
  • files_exist - File found: nextflow_schema.json
  • files_exist - File found: nextflow.config
  • files_exist - File found: README.md
  • files_exist - File found: conf/modules.config
  • files_exist - File found: conf/test.config
  • files_exist - File found: nf-test.config
  • files_exist - File found: tests/default.nf.test
  • files_exist - File found: main.nf
  • files_exist - File found: assets/multiqc_config.yml
  • files_exist - File found: conf/base.config
  • files_exist - File found: modules.json
  • files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
  • files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
  • files_exist - File not found check: .github/workflows/push_dockerhub.yml
  • files_exist - File not found check: .markdownlint.yml
  • files_exist - File not found check: .nf-core.yaml
  • files_exist - File not found check: .yamllint.yml
  • files_exist - File not found check: bin/markdown_to_html.r
  • files_exist - File not found check: conf/aws.config
  • files_exist - File not found check: docs/images/nf-core-twistcgp_logo.png
  • files_exist - File not found check: lib/Checks.groovy
  • files_exist - File not found check: lib/Completion.groovy
  • files_exist - File not found check: lib/NfcoreTemplate.groovy
  • files_exist - File not found check: lib/Utils.groovy
  • files_exist - File not found check: lib/Workflow.groovy
  • files_exist - File not found check: lib/WorkflowMain.groovy
  • files_exist - File not found check: lib/WorkflowTwistcgp.groovy
  • files_exist - File not found check: parameters.settings.json
  • files_exist - File not found check: pipeline_template.yml
  • files_exist - File not found check: Singularity
  • files_exist - File not found check: lib/nfcore_external_java_deps.jar
  • files_exist - File not found check: .travis.yml
  • files_unchanged - .gitattributes matches the template
  • files_unchanged - .prettierrc.yml matches the template
  • pipeline_if_empty_null - No ifEmpty(null) strings found
  • plugin_includes - No wrong validation plugin imports have been found
  • pipeline_name_conventions - Name adheres to nf-core convention
  • template_strings - Did not find any Jinja template strings (0 files)
  • schema_lint - Schema lint passed
  • schema_lint - Schema title + description lint passed
  • schema_lint - Input mimetype lint passed: 'text/csv'
  • schema_params - Schema matched params returned from nextflow config
  • system_exit - No System.exit calls found
  • actions_schema_validation - Workflow validation passed: linting.yml
  • actions_schema_validation - Workflow validation passed: linting_comment.yml
  • actions_schema_validation - Workflow validation passed: twistgp_ci.yml
  • merge_markers - No merge markers found in pipeline files
  • modules_json - Only installed modules found in modules.json
  • multiqc_config - assets/multiqc_config.yml found and not ignored.
  • multiqc_config - assets/multiqc_config.yml contains report_section_order
  • multiqc_config - assets/multiqc_config.yml contains export_plots
  • multiqc_config - assets/multiqc_config.yml contains report_comment
  • multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
  • multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
  • modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
  • local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
  • base_config - conf/base.config found and not ignored.
  • modules_config - conf/modules.config found and not ignored.
  • modules_config - ALIGNBAM found in conf/modules.config and Nextflow scripts.
  • modules_config - BCFTOOLS_VIEW_PRE_CIVIC found in conf/modules.config and Nextflow scripts.
  • modules_config - BCFTOOLS_VIEW_POST_CIVIC found in conf/modules.config and Nextflow scripts.
  • modules_config - BWAMEM2_INDEX found in conf/modules.config and Nextflow scripts.
  • modules_config - CIVICPY_UPDATE_CACHE found in conf/modules.config and Nextflow scripts.
  • modules_config - CIVICPY_ANNOTATE_VCF found in conf/modules.config and Nextflow scripts.
  • modules_config - CNVKIT_BATCH found in conf/modules.config and Nextflow scripts.
  • modules_config - UNTAR_VEP_CACHE found in conf/modules.config and Nextflow scripts.
  • modules_config - ENSEMBLVEP_DOWNLOAD found in conf/modules.config and Nextflow scripts.
  • modules_config - ENSEMBLVEP_VEP found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_MUTECT2 found in conf/modules.config and Nextflow scripts.
  • modules_config - GATK4_FILTERMUTECTCALLS found in conf/modules.config and Nextflow scripts.
  • modules_config - FASTP found in conf/modules.config and Nextflow scripts.
  • modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
  • modules_config - FGBIO_FASTQTOBAM found in conf/modules.config and Nextflow scripts.
  • modules_config - MSISENSORPRO_SCAN found in conf/modules.config and Nextflow scripts.
  • modules_config - MSISENSOR2_MSI found in conf/modules.config and Nextflow scripts.
  • modules_config - MSISENSORPRO_PRO found in conf/modules.config and Nextflow scripts.
  • modules_config - PERBASE found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_COLLECTHSMETRICS found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_COLLECTMULTIPLEMETRICS found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_INTERVALLISTTOBED found in conf/modules.config and Nextflow scripts.
  • modules_config - PICARD_MARKDUPLICATES found in conf/modules.config and Nextflow scripts.
  • modules_config - SAMTOOLS_FAIDX found in conf/modules.config and Nextflow scripts.
  • modules_config - SAMTOOLS_DICT found in conf/modules.config and Nextflow scripts.
  • modules_config - SNPEFF_DOWNLOAD found in conf/modules.config and Nextflow scripts.
  • modules_config - SNPEFF_SNPEFF found in conf/modules.config and Nextflow scripts.
  • modules_config - TMB found in conf/modules.config and Nextflow scripts.
  • modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
  • modules_config - TWISTCGP found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_POPULATION_GERMLINE found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_PON found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_COSMIC found in conf/modules.config and Nextflow scripts.
  • modules_config - TABIX_GNOMAD found in conf/modules.config and Nextflow scripts.
  • nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
  • nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.3.2

Run details

  • nf-core/tools version 3.3.2
  • Run at 2026-06-08 19:50:08

@emmcauley

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR adds support for accepting Ensembl VEP cache as either a pre-extracted directory or a .tar.gz archive. A new UNTAR_VEP_CACHE module extracts tarballs automatically before VEP runs. The main workflow refactors cache channel construction to branch conditionally: .tar.gz inputs are extracted via the module, while directory paths are used directly or fall back to the annotation database output. Documentation and parameter schema updated to describe both accepted input modes.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: pipeline now accepts VEP cache as either a tarball or directory.
Description check ✅ Passed Description explains the feature, links to the related issue, and provides context for implementation choice.
Linked Issues check ✅ Passed Changes fully implement issue #96's requirement: pipeline now accepts VEP cache as .tar.gz files or directories with automatic extraction.
Out of Scope Changes check ✅ Passed All changes directly support the VEP cache tarball/directory feature. Documentation, schema, module config, and new UNTAR_VEP_CACHE module are all on-scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch em_vep_cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
main.nf (1)

134-134: ⚡ Quick win

ensemblvep_cache take parameter is declared but never used.

The FULCRUMGENOMICS_TWISTCGP workflow accepts ensemblvep_cache as a take: input (line 134), but the new if/else block reads params.ensemblvep_cache directly and never references the channel parameter. This makes the take parameter dead code and couples the inner workflow to global params, reducing reusability.

Consider either: (a) using the ensemblvep_cache channel parameter in the if/else logic and removing the direct params access, or (b) removing the ensemblvep_cache take parameter entirely and updating the caller at line 97.

Also applies to: 170-181

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.nf` at line 134, The workflow FULCRUMGENOMICS_TWISTCGP declares an input
channel ensemblvep_cache but the if/else logic reads params.ensemblvep_cache
directly, making the channel unused; update the conditional and downstream uses
to consume the ensemblvep_cache channel instead of params.ensemblvep_cache (e.g.
use ensemblvep_cache.first() or check ensemblvep_cache.empty? as appropriate) so
the workflow is driven by its input channel, and apply the same change to the
duplicate logic around the block that mirrors lines 170-181; alternatively, if
you prefer the global param, remove the ensemblvep_cache take: declaration and
update callers accordingly—pick one approach and make the code consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@main.nf`:
- Around line 171-175: The UNTAR_VEP_CACHE process is receiving a single-element
list because `.collect()` wraps the 2-element tuple emitted by
Channel.fromPath(...).map { [[id: 'vep_cache'], it] } into a list; remove the
`.collect()` so the channel supplies a destructurable 2-element tuple to
UNTAR_VEP_CACHE. In short: change the code that builds the input channel for
UNTAR_VEP_CACHE (the Channel.fromPath(params.ensemblvep_cache).map { [[id:
'vep_cache'], it] } expression) to omit `.collect()` so UNTAR_VEP_CACHE receives
tuple val(meta), path(archive) as expected.

In `@modules/nf-core/untar/main.nf`:
- Around line 53-62: The single-top-level-dir branch currently writes files
using the original archive paths so ${prefix} stays empty; change the loop
handling when the test on ${archive} is true to strip the first path component
from each archive entry and create files/dirs under ${prefix} (i.e., derive a
variable like stripped=$(echo "${i}" | sed -E 's#^[^/]+/##') and then use
${prefix}/$stripped for mkdir -p and touch), ensuring directories and files are
created inside ${prefix} rather than at the archive root; update the branch that
iterates over tar -tf ${archive} to use this stripped path logic for both files
and directories.

---

Nitpick comments:
In `@main.nf`:
- Line 134: The workflow FULCRUMGENOMICS_TWISTCGP declares an input channel
ensemblvep_cache but the if/else logic reads params.ensemblvep_cache directly,
making the channel unused; update the conditional and downstream uses to consume
the ensemblvep_cache channel instead of params.ensemblvep_cache (e.g. use
ensemblvep_cache.first() or check ensemblvep_cache.empty? as appropriate) so the
workflow is driven by its input channel, and apply the same change to the
duplicate logic around the block that mirrors lines 170-181; alternatively, if
you prefer the global param, remove the ensemblvep_cache take: declaration and
update callers accordingly—pick one approach and make the code consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 991e27e3-1833-42cf-a8da-2b8298feee49

📥 Commits

Reviewing files that changed from the base of the PR and between 65969ea and 3b5fc95.

⛔ Files ignored due to path filters (1)
  • modules/nf-core/untar/tests/main.nf.test.snap is excluded by !**/*.snap
📒 Files selected for processing (8)
  • docs/variant_annotation.md
  • main.nf
  • modules.json
  • modules/nf-core/untar/environment.yml
  • modules/nf-core/untar/main.nf
  • modules/nf-core/untar/meta.yml
  • modules/nf-core/untar/tests/main.nf.test
  • nextflow_schema.json

Comment thread main.nf
Comment thread modules/nf-core/untar/main.nf Outdated
@emmcauley emmcauley force-pushed the em_vep_cache branch 2 times, most recently from 6b09a9b to e0939bf Compare May 14, 2026 19:19
Comment thread workflows/twistcgp.nf Outdated
// MODULE: PERBASE
//
PERBASE(ALIGNBAM.out.bam_bai, ch_fasta.join(ch_fasta_fai).first())
PERBASE(ALIGNBAM.out.bam_bai, ch_fasta.join(ch_fasta_fai))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(blocking)
Why'd we drop the .first() here?

I believe .join returns a queue channel, so this will only process the first sample.

Is this doing what we want? It also seems unrelated to the PR.

Comment thread main.nf Outdated
Comment on lines +168 to +171
channel.fromPath(params.ensemblvep_cache)
.map { it -> [[id: 'vep_cache'], it] }
.collect()
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue(blocking)

Coderabbit is correct on the bug, but its solution leaves this as a queue channel and so is wrong in a different way.

Suggested change
channel.fromPath(params.ensemblvep_cache)
.map { it -> [[id: 'vep_cache'], it] }
.collect()
)
channel.fromPath(params.ensemblvep_cache)
.collect { it -> [[id: 'vep_cache'], it] }
)

or

Suggested change
channel.fromPath(params.ensemblvep_cache)
.map { it -> [[id: 'vep_cache'], it] }
.collect()
)
channel.value(file(params.ensemblvep_cache))
.map { it -> [[id: 'vep_cache'], it] }
)

Comment thread main.nf Outdated
.map { it -> [[id: 'vep_cache'], it] }
.collect()
)
ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue(blocking)

Suggested change
ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()
ch_vep_cache = UNTAR_VEP_CACHE.out.cache.first()

Same idea as above.

However, I think nextflow will actually implicitly preserve value channel status if all the inputs to a process are value channels. I don't see that documented anywhere though so we shouldn't rely on it.

process VALUE_IN_VALUE_OUT {
    input:
    tuple val(meta), path(some_file)

    output:
    tuple val(meta), path("output.txt"), emit: output

    script:
    """
    touch output.txt
    """
}


workflow {
    ch = channel.fromPath("somefile.txt")
        .collect { it -> [[id: "file"], it] }
    print(ch)
    ch.view()
    VALUE_IN_VALUE_OUT(ch)

    ch2 = VALUE_IN_VALUE_OUT.out.output
    print(ch2)
    ch2.view()
}
DataflowVariable(value=null)
DataflowVariable(value=null)
executor >  local (1)
[fb/41b205] VALUE_IN_VALUE_OUT | 1 of 1 ✔
[['id':'file'], /.../somefile.txt]
[[id:file], /.../work/fb/41b2054051393a8611a540b98dd9d5/output.txt]

Comment thread main.nf Outdated
ch_vep_cache = UNTAR_VEP_CACHE.out.cache.collect()
} else {
ch_vep_cache = params.ensemblvep_cache
? channel.fromPath(params.ensemblvep_cache).map { it -> [[id: 'vep_cache'], it] }.collect()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue(blocking)

Suggested change
? channel.fromPath(params.ensemblvep_cache).map { it -> [[id: 'vep_cache'], it] }.collect()
? channel.fromPath(params.ensemblvep_cache).collect { it -> [[id: 'vep_cache'], it] }

@znorgaard znorgaard assigned emmcauley and unassigned znorgaard Jun 4, 2026
@emmcauley

emmcauley commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

I ran this branch with both an uncompressed directory as well as a vep.tar.gz, here's how the cache path resolves:

tarbell

Cache: /private/tmp/nf_work_vep_tarball/37/37d02b0f04fb8eefeb155756dc16b6/cache/homo_sapiens/113_GRCh38; homo_sapiens_core_113_38 on ensembldb.ensembl.org

directory

Cache: /private/tmp/nf_work_vep_dir/c8/d98dc0180fa9230e7aada185897bb8/vep/homo_sapiens/113_GRCh38; homo_sapiens_core_113_38 on ensembldb.ensembl.org

@znorgaard

Copy link
Copy Markdown
Collaborator

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
main.nf (1)

173-174: ⚡ Quick win

Use consistent pattern for directory paths.

The tarball branch (line 168-169) uses channel.value(file(...)).map {...} while the directory branch uses channel.fromPath(...).collect {...}. Both produce value channels but the pattern inconsistency is confusing. Also, fromPath is designed for glob patterns; for directory paths, file() wrapped in channel.value() is more explicit.

♻️ Align with tarball branch pattern
     ch_vep_cache = params.ensemblvep_cache
-        ? channel.fromPath(params.ensemblvep_cache).collect { it -> [[id: 'vep_cache'], it] }
+        ? channel.value(file(params.ensemblvep_cache)).map { it -> [[id: 'vep_cache'], it] }
         : PREPARE_ANNOTATION_DB.out.ensemblvep_cache
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@main.nf` around lines 173 - 174, ch_vep_cache uses channel.fromPath(...) for
the directory branch while the tarball branch uses
channel.value(file(...)).map(...); change the directory branch to the same
explicit pattern by wrapping params.ensemblvep_cache with file(...) and
channel.value(...) and using .map to produce the [[id:'vep_cache'], it] tuple so
both branches use the same value-channel approach (refer to ch_vep_cache and
params.ensemblvep_cache).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modules/local/untar_vep_cache/main.nf`:
- Around line 22-25: After extracting ${archive} into _tmp, validate that there
is exactly one top-level entry before running mv: list entries in _tmp (the
current code's top_level capture), count them, and if the count != 1 emit an
error (including the unexpected entries) and exit non-zero; only when count == 1
proceed to set top_level and mv "_tmp/${top_level}" cache. Reference the
existing variables/commands top_level, _tmp, ${archive}, tar and mv to locate
where to add this validation.

---

Nitpick comments:
In `@main.nf`:
- Around line 173-174: ch_vep_cache uses channel.fromPath(...) for the directory
branch while the tarball branch uses channel.value(file(...)).map(...); change
the directory branch to the same explicit pattern by wrapping
params.ensemblvep_cache with file(...) and channel.value(...) and using .map to
produce the [[id:'vep_cache'], it] tuple so both branches use the same
value-channel approach (refer to ch_vep_cache and params.ensemblvep_cache).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dcdcf3da-ff65-4283-9be8-ca4b82e0dab5

📥 Commits

Reviewing files that changed from the base of the PR and between 3b5fc95 and 2711c2d.

📒 Files selected for processing (8)
  • CHANGELOG.md
  • conf/modules.config
  • docs/variant_annotation.md
  • main.nf
  • modules/local/untar_vep_cache/environment.yml
  • modules/local/untar_vep_cache/main.nf
  • modules/local/untar_vep_cache/meta.yml
  • nextflow_schema.json
✅ Files skipped from review due to trivial changes (4)
  • CHANGELOG.md
  • modules/local/untar_vep_cache/environment.yml
  • modules/local/untar_vep_cache/meta.yml
  • docs/variant_annotation.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • nextflow_schema.json

Comment on lines +22 to +25
mkdir -p _tmp
tar -xzf ${archive} -C _tmp
top_level=\$(ls _tmp | head -1)
mv "_tmp/\${top_level}" cache

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate exactly one top-level entry after extraction.

The script assumes the tarball contains exactly one top-level directory. If the archive has multiple entries, only the first is moved; others are silently lost. If zero entries or unexpected structure, mv fails.

🛡️ Add validation before moving
 mkdir -p _tmp
 tar -xzf ${archive} -C _tmp
+entry_count=\$(ls -A _tmp | wc -l)
+if [ \$entry_count -ne 1 ]; then
+    echo "Error: Expected exactly one top-level entry in tarball, found \$entry_count" >&2
+    exit 1
+fi
 top_level=\$(ls _tmp | head -1)
 mv "_tmp/\${top_level}" cache
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
mkdir -p _tmp
tar -xzf ${archive} -C _tmp
top_level=\$(ls _tmp | head -1)
mv "_tmp/\${top_level}" cache
mkdir -p _tmp
tar -xzf ${archive} -C _tmp
entry_count=\$(ls -A _tmp | wc -l)
if [ \$entry_count -ne 1 ]; then
echo "Error: Expected exactly one top-level entry in tarball, found \$entry_count" >&2
exit 1
fi
top_level=\$(ls _tmp | head -1)
mv "_tmp/\${top_level}" cache
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modules/local/untar_vep_cache/main.nf` around lines 22 - 25, After extracting
${archive} into _tmp, validate that there is exactly one top-level entry before
running mv: list entries in _tmp (the current code's top_level capture), count
them, and if the count != 1 emit an error (including the unexpected entries) and
exit non-zero; only when count == 1 proceed to set top_level and mv
"_tmp/${top_level}" cache. Reference the existing variables/commands top_level,
_tmp, ${archive}, tar and mv to locate where to add this validation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants